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On the Unicity Distance of Stego Key 

Weiming Zhang, and Shiqu Li 

Abstract — Steganography is about how to send secret message covertly. 
And the purpose of steganalysis is to not only detect the existence of the 
hidden message but also extract it. So far there have been many reliable 
detecting methods on various steganographic algorithms, while there are 
few approaches that can extract the hidden information. In this paper, 
the difficulty of extracting hidden information, which is essentially a 
kind of privacy, is analyzed with information-theoretic method in the 
terms of unicity distance of steganographic key (abbreviated stego key). 
A lower bound for the unicity distance is obtained, which shows the 
relations between key rate, message rate, hiding capacity and difficulty of 
extraction. Furthermore the extracting attack to steganography is viewed 
as a special kind of cryptanalysis, and an effective method on recovering 
the stego key of popular LSB replacing steganography in spatial images 
is presented by combining the detecting technique of steganalysis and 
correlation attack of cryptanalysis together. The analysis for this method 
and experimental results on steganographic software "Hide and Seek 
4.1" are both accordant with the information-theoretic conclusion. 

Index Terms — cryptanalysis, steganalysis, unicity distance, extracting 
attack, correlation attack, "Hide and Seek 4.1". 

I. Introduction 

Steganography is an important branch of information hiding, and 
it is about how to send secret message covertly. The attacks to 
steganography (i.e. steganalysis) mainly include passive attack, active 
attack, and extracting attack. A passive attacker only wants to detect 
the existence of the embedded message, while an active attacker 
wants to destroy it. The purpose of an extracting attacker is to obtain 
the message embedded into the innocent data. So there are three 
kinds of security for different attacks respectively, i.e. detectability, 
robustness and difficulty of extraction. 

The theoretic study about steganography has always been concern- 
ing the detectability, and there have been many literatures that model 
the detectability with information-theoretic method or in the terms of 
computational complexity [l]-[4]. On the other hand, references [5]- 
[7] think of the information hiding problem with active attackers as a 
"capacity game", and define the robustness using the "hiding capcity". 
Although robustness is mainly concerned in watermarking problem, 
it, as the measure of efficiency, is also important for steganography. 
And references [8]-[l 1] analyze the relation between the detectability 
and robustness. 

Similar with the theoretic field, the study about actual steganalysis 
has also being centering on detecting technique. And there have been 
many detecting methods for a variety of steganographic algorithms 
such as [12]— [14]. However, there are only a few papers about 
extracting attack. Chandramouli [15] studies how to make extracting 
attack on spread spectrum steganography for a special scenario in 
which the same message is sent twice in the same image with 
different strength factors. Fridrich et al. [16] show how to get the 
hidden message through recovering the key of LSB steganography 
on JPEG images such as "F5 [17] and Outguess [18]". And recently 
in [19] Fridrich et al. extent their approach to spatial domain. 
Another extracting approach to LSB steganography on JPEG images 
is presented by Ma et al. [20]. 

The extracting attack on steganography can be viewed as a special 
kind of cryptanalysis. In fact for most of steganographic systems the 
message is required to be encrypted before it is hidden. Therefore, 
when facing the model of "encrytion+hiding", a cryptanalyst has 
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to analyze a "multiple cipher". Fridrich et al. [16] analyze the 
complexity of searching stego-key: If there is some recognizable 
structure in the steganographic communication, one can use it as a 
sign to searching the key by dictionary attack or brute-force search; 
otherwise, searching process should try all encryption keys for every 
possible stego-key, so the complexity of brute-force search becomes 
proportional to the product of the number of stego and crypto keys. 
That means that the extraction and decipher should be done together. 
Obviously a cryptanalyst hope that the two tasks can be finished 
independently. And the extracting attack just solve the problem 
how to extract the embedded sequence without regard to encryption 
algorithm. 

In this paper, the difficulty of extraction, which is essentially a kind 
of privacy, is studied with information-theoretic method in the terms 
of unicity distance of stego key. Unicity distance is just the minimum 
number of data needed by the attacker to recover the stego key, which 
can exactly grasp the concept on "difficulty of extraction" for key 
based stegonography. The relations between key rate, message rate, 
hiding capacity and unicity distance are analyzed. And it is proved 
that unicity distance is directly proportional to the entropy of stego- 
key, and inversely proportional to "hiding redundancy" which is the 
difference between the hiding capacity and message rate. 

As mentioned above, our conclusion comes from the basic idea that 
extracting attack on steganography is a special kind of steganalysis. 
Therefore this problem can be solved by combining traditional 
techniques of cryptanalysis and steganalysis together. As an exam- 
ple, we present an extracting approach on random LSB replacing 
steganography of spatial images, which is based on some detecting 
techniques in steganalysis and the idea of correlation attack [21] in 
cryptanalysis. One contribution of our attack is that it can accurately 
estimate the amount of necessary data. With this method, we make 
a successful extracting attack on steganographic software "Hide and 
Seek 4.1" [22] which is found in the United States recently [23]. 
Experimental results on "Hide and Seek 4.1" are accordant with the 
analysis for our extracting algorithm, which also verify the validity 
of the information-theoretic conclusion. 

The rest of this paper is organized as follows. The main theorem 
on unicity distance of stego key is given in Sect. II. And in Sect. 
Ill a method of recovering stego key - "correlation attack" - on 
LSB replacing steganography of spatial images is presented. The 
experimental results on attacking "Hide and Seek 4.1" is given in 
Sect. IV. And the paper concludes with a discussion in Sect. V. 

II. Information-Theoretic Analysis for the Unicity 
Distance of Stego key 
A. Notations and Definitions 

For the information-theoretic analysis, we use the following no- 
tations. Random variables are denoted by capital letters (e.g. X), 
and their realizations by respective lower case letters (e.g. x). The 
domains over that random variables are defined are denoted by script 
letters (e.g. X). Sequences of N random variables are denoted with a 
superscript (e.g. X N = (Xi, X2, ■ ■ ■ , Xn) which takes its values on 
the product set X N ). And we denote entropy and conditional entropy 
with H(-) and H(-\-) respectively. 

A general model of a stegosystem can be described as follows. 
The embedded data M is hidden in an innocuous data X, usually 
named cover object, in the control of a secret stego key A', producing 
the stego object X. The stego key is shared between the sender and 
receiver but is secret for the third party. And the receiver can extract 
M from X with the stego key K. An extracting attacker wants to 
recover the embedded message or the stego key through the stego 
object (Maybe he can use some side information, for example part 
knowledge about the cover object). 
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Assume that the cover object data is a sequence X — 
(Xi, X2, ■ ■ ■ , Xn) of independent and identically distributed (i.i.d) 
samples from P(x). Because the embedded message usually is cipher 
text, we assume that it is a sequence M N — (Mi, Mi, ■ ■ ■ , Mn) of 
independent and uniformly distributed, and independent of X N . The 
stego key K is independent of the message and cover object. 

Now we describe a formal definition of steganographic code which 
is introduced by Moulin et al. [7], [24]. First of all, the embedding 
algorithm of a stegosystem should keep transparency that can be 
guaranteed by some distortion constraint. A distortion function is a 
nonnegative function d : X x X — > 7£ + U{0}, which can be extended 

JV 

to one on N-tuples by d(x N ,y N ) = jj J2 d{xi,Vi)- 

i=i 

Definition 1: ' 7 ' A length-iV steganographic code subject to dis- 
tortion D is a triple (A4, /at, 0at), where 

• M is the message set of cardinality \M\\ 

• /jv : X N x M x K, — > X N is the embedding algorithm 
mapping a sequence x N , a message m and a key k to a sequence 
a;^ = fN{x N ,m,k). This mapping is subject to the distortion 
constraint 

E E E whci^) 

■ d(x N , f N (x N ,m,k)) < D ; 

• 4>n ■ X N x 1C N — > M is the extracting algorithm mapping 
the received sequence x N with the key k to a decoded message 

m = <f> N (x N , k). 

A cover channel is a conditional p.m.f. (probability mass function) 
q(x\x) : X — - > X. The compound cover channel subject to distortion 
D is the set 

Q = {q(x\x) : ^d(x,x)q(x\x)P(x) < D} . 

x,x 

The length- N memoryless extension of the channel is the conditional 

p.m.f. 

N 

q{x N \x N ) = \{q{xi\x i ) 1 ViV > 1 . 

For a length- N steganographic code, define the message rate and key 
rate as 

_H{M) _H{K) 

Km- N , H k - N 

respectively. And define the probability of error as P e jv = 
P(4>n(X n , K) 7^ M). The hiding capacity is the supremum of all 
achieve message rates of steganographic codes subject to distortion 
D under the condition of zero probability of error (i.e. P e ,jv — » 
as N — > 00). 

Because we disregard the active attacker and assume that K is 
independent of M and X, the results of [7], [24] imply that the 
expression of hiding capacity for steganographic code can be given 
by 

C(D) = max H(X\X) . (1) 

q(x\x)£Q 

Because C(D) is the maximum of the conditional entropy through 
all cover channels subject to D distortion, C(D) just reflects the 
hiding ability of the cover-object within the distortion constraint. So 
we refer to C(D) — R m as the hiding redundancy, which can reflect 
the hiding capability of the steganographic code. 



B. Unicity Distance of Stego-key 

According to the Kerckhoff 's principle, the security of a stegano- 
graphic code should be based on nothing but the secrecy of the stego 
key. Therefore, it is important to analyze the key equivocation. In 
details, we want to know how many data the attacker must used 
to recover the stego key, i.e. the unicity distance of stego key. We 
analyze this problem according to two kinds of attacking conditions. 
One is stego-only extracting attack, i.e. the attacker can only get the 
stego objects; the other is known-cover extracting attack that means 
that the attacker can get not only the stego objects but also some 
corresponding cover objects. And we begin the analysis with known- 
cover attack. 

Theorem 1: (M, fN , 0jv) is length- iV steganographic code sub- 
ject to distortion D with zero probability of error, i.e. for any given 
e > 0, PeN = P((Pn(X n ,K) ^ M) < e. Then for given sequence 
of n (n is large enough) pairs of cover objects and stego objects, the 
expectation of spurious stego keys S„ for known-cover extracting 
attack has the lower bound such that 

oH(K) 

-q > ± 1 

° n — 2"N(C(D)-R m +e) ' 

where C(D) = max H(X\X) is the hiding capacity and R m = 

q(x\x)£Q 

H ^ is the message rate. 

Proof: For a given sequence of pairs of cover objects and stego 
objects (x N ,x N ) n , the set of possible stego keys is defined as 

K((x N , /)") = {k e K\3 m" e M n such that 
P(m n ) > and /JJ(i"" ,m" , k) = x Nn } 

where 

N (x ,m ,k) 
= (/jv(zf, mi, &),-•• ,f N (Xn,m n ,k)) 
- (x? ■■■ x N ) -x Nn 

So the number of spurious stego keys for observed (x N ,x N )" is 
\K((x N ,x N ) n ) \ — 1, and the expectation of spurious stego keys is 
given by 

S n = P((x N ,x N ) n )[\K((x N ,x N r)\-l] 

(x N ,x N ) n 

= Yl p((x N ,x N r)\K((x N ,x N r)\-i . 

(5 N ,x N )™ 

Using Jesen's inequality, we can get 

H(K\X Nn ,X Nn ) 

P((x N ,x N T)H(K\(x N ,x N D 

(x N ,x N )" 

< y, p((x N ,x N r)io g2 \K((x N ,x N r)\ 

(Z N ,x N ) n 

< log 2 Y, P{{x N ,x N T)\K{{x N ,x N T)\ 

(x N ,x N )" 

= log 2 (S„ + l) . (2) 

On the j)ther hand, f^(x Nn ,m n ,k) = x Nn implies 
H(X N "\X Nn ,M n ,K) = 0, which, together with the assumption 
that key is independent of message and cover object, message is 
independent of cover object, and the sequences X Nn and M n are 
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both i.i.d. sequence of random variables, yields that 

H{X Nn ,X Nn ,M n ,K) 
= H(X Nn \X N ",M",K)+H(X Nn ,M n ,K) 
= H(X Nn ,M n )+H{K) 
= NnH{X)+nH(M)+H(K) . 



(3) 



Since the steganographic code satisfies zero probability of error, we 
have, for any given e > 0, 



P(ct> n N (X Nn ,K)^M n ) 



,cb N (X^,K)^(M 1 ,---M n )) 

rN 



= P(^ N (X?,K),- 

= P(3 i such that 1 < i < n and </> N (X" , K) / Mi) 

n 

< Y, P ^{Xf,K) 1 kM i ) 



! = 1 

< ne . 



(4) 

Equation J4) with Fano's inequality implies that for any given e > 0, 

H(M n \X N " ,K) < ne (5) 

Furthermore, because sequence X Nn is i.i.d. sequence of random 
variables and cover channel is memoryless, we obtain that 

H{X Nn ,X Nn ,M n ,K) 
= H(X Nn ) + H{X Nn \X Nn ) + H{K\X Nn ,X Nn ) 
+H(M n \X Nn ,X Nn ,K) 

< NnH(X) + NnH(X\X) + H(K\X Nn ,X Nn ) 
+H(M n \X Nn ,K) 

< NnH(X) + NnH(X\X) + H(K\X Nn ,X Nn ) + ne.(6) 

combining J5) and ^6) yields that, for any given e > 0, 

H(K\X Nn ,X Nn ) >H(K) + nH(M)-NnH(X\X)-n£ , (7) 
which, together with 0, implies for any given e > 0, 

log 2 (S„ + 1) > H{K) + nH(M) - NnH(X\X) - ne , 

i.e. 



S n > 



2 H{K) 



2n(NH(X\X)-H(M)+e) 



1 



(8) 



Since hiding capacity C(D) satisfies C(D) — max H(X\X) and 

q(x\x)£LQ 

H V?' , we have, for any given e > 0, 



R 



S n > 



2 H(K) 



2nN(C(D)-R m +e) 



Definition 2: The unicity distance no for a steganographic code 
with known-cover extracting attackers is the minimum number of 
pairs of cover objects and stego objects with which one expects that 
the expectation of spurious stego keys equals zero. And the unicity 
distance n\ for a steganographic code with stego-only extracting 
attackers is the minimum number of stego objects with which one 
expects that the expectation of spurious stego keys equals zero. 

It is easy to know that n\ > no . And using Theorem Q we can 
get the following important corollary. 

Corollary 2: The unicity distance no for known-cover extracting 
attack and n\ for stego-only extracting attack satisfy that for any 
given e > 0, 

Rk 



m > no > 



where C(D) — max H(X\X) is the hiding capacity, R m = 

q(x\x)£Q 

H ^\f is the message rate and Rk = —jp- is the key rate. 

Corollary [2] shows that larger key rate Rk and smaller hiding 
redundancy C(D) — R m can make stronger difficulty of extraction. 
The former is clear, while, for the latter, we give an intuitive 
explanation as follows. Smaller hiding redundancy means a message 
rate more appropriate for the cover channel. In this case, dealing with 
the stego-objects (such as sampling) with correct and spurious key 
respectively can only bring small differences. In other words, it is 
difficult for the extracting attacker to distinguish between the correct 
key and spurious ones. 

C. The Analysis for LSB Steganography 

As an example, we use the results in preceding subsection to 
analyze the most popular steganographic mechanism, i.e. random 
LSB steganoraphy on images, such as F5 [17], Outguess [18] and 
"Hide and Seek" [22]. 

LSB replacing steganography usually work in the following man- 
ner: Firstly, select an image with iV DCT coefficients for JPEG im- 
ages (or N pixels for spatial images) denoted by C = (ci, • • • , cm)- 



Then randomly pick a subset of pixels, {c 



,}, using a 



Pseudo-Random Number Generator (PRNG) which is seeded with 
a stego-key k belonging to the key space K,, i.e. the PRNG with 
k generates a embedding path {ji, ■ • • , j'l}- Finally, embedding the 
message sequence M = (mi,-- - ,mL), where mi € {0,1}, by 
replacing the LSBs of {cj ± , • • • , Cj L } or other embedding operations 
such as ±1 to the DCT coefficients (or pixels), and generate the stego- 
image S — (si,-- - ,sjv). Two kinds of embedding operations are 
shown in Table I and Table II respectively. 

TABLE I 

LSB REPLACING EMBEDDING OPERATION 



Sample value 


2i 


2i+l 


Embedded message bit 





1 





1 


Modified sample value 


2i 


2i+l 


2i 


2i+\ 



TABLE II 
±1 EMBEDDING OPERATION 



Sample value 


2i 


2i+l 


Embedded message bit 





1 





1 


Modified sample value 


2i 


2i+l or 2i-\ 


2i or 2i+2 


2i+l 



The embedding rate r is defined as the ratio of the length of 
message to that of image, i.e. r = -ft. which means that the possibility 
of a DCT coefficient (or pixel) being selected to carry one bit message 
is r, because the message is asked to randomly scattered in the whole 
image. Since message sequence M is usually cipher text, we assume 
that M is uniformly distributed and independent with C, therefore 
every pixel is modified with probability |, In fact LSBs of images are 
similar to noise data and then approximately is uniformly distributed 
and independent with M, so the assumption of modifying rate being 
| is also reasonable for plain text M. 

When using Corollary [2] we have to compute the hiding capacity 
that is hard generally. However, if the cover-objects are binary 
sequence satisfying distribution of Bernoulli(|) and the distortion 
metric is Hamming metric, hiding capacity is given in [24]. The 
capacity is 



C(D) -R m + e 



C(D) = 



H(D) 
1 



if < D < i 
ifZ>>i 



(9) 



JOURNAL OF MeX CLASS FILES, VOL. 1, NO. 11, NOVEMBER 2002 



4 




0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 

r 



Fig. 1. Hiding redundancy: the curve denotes the "hiding capacity" H (£), 
the beeline stands for the message rate r, and the difference between them is 
just the hiding redundancy. 

where H{D) = -Dlog 2 D - (1 - D) log 2 (l - D). 

To analyze the LSB steganography, for simple we take the LSBs 
of the DCT coefficients (or pixels) as cover-objects, which satisfies 
distribution of Bernoulli(|) approximatively. And when the the 
embedding rate is r (0 < r < 1), message rate is just R m = j& = 
r bits/sign (note that R m has a unit but embedding rate r has not) 
and the Hamming distortion is ^. Therefore {5) implies the hiding 
capacity is H(^), and the hiding redundancy is i?(§) — r. 

In FigQ it is clear that when r — > (or r — > 1), the redundancy 
of cover channel H (£) — r — > 0, with which Corollary [2] implies 
that the unicity of the stego key tends to infinity, i.e. it is hard for 
the attack to succeed. 

III. Extracting Attack on LSB Replacing 
Steganography of Spatial Images 

Reference [16] presents an extracting attack on LSB steganography 
of JPEG images (such as F5 and Outguess), and [19] make an 
extracting attack on LSB (replacing or ±1) steganography of spatial 
images. The purposes of these attacks are both to recovery the stego- 
key, and the experimental results show the same phenomena that the 
attacking processes need more data for small or large embedding rate 
r, and when r — > (or r — > 1) the attacks will fail, which consists 
with the information-theoretic conclusion in Sect. II. However, on 
the other hand, it should be noted that the analysis in Sect. II is 
based on some general assumptions and the lower bound in corollary 
[2] is obtained from known-cover attack although it is also a lower 
bound for stego-only attack. Therefore the results of preceding section 
can only reflect the tendency of the difficulty of recovering stego 
key but can not be used to estimate the amount of needed data by 
the attacker. And the methods of [16] and [19] are both based on 
non-parameter hypothesis testing, by which it is hard to calculate 
the necessary amount of samples. Now we present a new stego key 
searching method for LSB replacing steganography of spatial images 
by using a parameter hypothesis testing, which is efficient and simpler 
than preceding methods. The main contribution of our attack is that 
it can accurately estimate the amount of necessary data, which is 
important because with less data we cannot get the stego key while 
too much data will slow down the searching speed. 

Our method is also an example about how to do extracting attack 
by combining traditional techniques of cryptanalysis and steganalysis 
together. The main ideas are as follows. Firstly estimate the length 
of the message (the embedding rate) with some detecting methods. 
And then filter the stego image to get the data of its noise area 
that can be thought of as a sample from a mixture distribution [25] 



with the mixing parameter as a function of the embedding rate. 
Through analyzing this mixture distribution, we can exploit some 
"accordant advantage" of the correct stego key over those spurious 
ones. Finally, with this accordant advantage, do the correlation attack 
as cryptanalysis to obtain the stego key. 

We do extracting attack under the assumption that we get a stego 
image and know the steganographic algorithm. And the only thing we 
don't know is just the stego key. This assumption is similar with that 
in cryptanalysis. And in this paper, 8 bits grayscale images is taken as 
examples to describe our method. And the same notations as those in 
Sect. II (C) will be used. In details, denote the cover image and stego 
image with N pixels by C = (ci, • • • , cjv) and S = (si, • • • , sjv) 
respectively, where Ci, s; £ [0, 255] and 1 < i < N. The stego key 
k, belonging to the key space K,, is just the seed of the PRNG. The 
message sequence is denoted by M — (mi, ■ • ■ ,mj), Notice that, 
as mentioned in Sect. I, message is usually required to be encrypted 
before it is embedded into images, which is why recovering stego key 
with simple brute-force search has to consider the encryption key at 
the same time. And the purpose of our method is to get the stego key 
k regardless of encryption key when getting only the stego image S. 

A. A mixture distribution model of stego images ' noise 

LSB steganography essentially hides the message in the noise area 
of the image. Therefore we analyze the noise data of the stego image. 
Firstly filter the stego image S = (si, • • ■ , sjv) with spatial average 
filter, and get a"new image" S = {si, S2, ■ • ■ , sjv}. Note that here 
save Si's as real numbers, i.e. keep several digits of decimal fraction 
when averaging pixels. Then take difference between the pixels of S 
and S as the noise data. For 1 < i < N, if Si is odd, the noise data 
is defined as Wi = Si — Si, and if s; is even Wi = Si — Si. The set 
of noise data is denoted by W = {wi, W2, ■ ■ ■ , wn}- 

It is reasonable to assume that the noise data Wi's corresponding 
to Si's, which have not been modified, is a sample from a Gaussian 
White Noise approximately, i.e. a normal distribution with mean 
and variance o 2 . And if the pixel Si in ith position has been modified 
in embedding process, 1 has been added to c; when Si is odd, and 
1 has been subtract from Ci when s; is even as shown in Table 
Therefore Wi's corresponding to modified Si's can be viewed as a 
sample from a normal distribution with mean 1 and the same variation 
a 2 . Here we ignore the influence of modifying pixels around the 
position i, because this kind of influence is counteracted by averaging 
them. Both of the two assumptions have been verified by experimental 
results on many images. When embedding rate is r , in S on average 
^ of pixels have been modified. So W — {wi, 11)2, • ■ • , wjv} is a 
sample from a mixture distribution 

Fr(x) = (l-~)F(x) + ?-G(x) (10) 
2 A Z 

where F(x) and G(x) are the distribution functions of normal 
distribution N(0,a 2 ) and 7V(l,a 2 ) respectively. 

For k £ K,, let I(k) denote the set of sample indices visited along 
the path generated from the key k. If k is a spurious key, {wj}j£/(fc) 
is a random sample from distribution 1 101 . On the other hand, if k is 
just the correct key ko, in {^i}j6i(fc ) on average 50% of samples 
are from distribution -F(a;) and the other 50% of them from the 
distribution G(x). So in this case, {wj}j e j(k ) is a random sample 
from mixture distribution such as 

Fi(x) = ±F{x) + ±G(x) . (11) 

When < r < 1, the difference between distributions J 101 and Jilt 
can be used to distinguish the correct key from those spurious ones. 
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B. Accordant Advantage 

To exploit the difference between mixture distributions 1101 and 
1 1 1 i . let Xo be a random variable with distribution function F(x), Xi 
is a random variable with distribution function G(x), cto = P{Xo > 
A}, and «i = P{X\ > A}, where A is a real number larger than 
zero. Then 



ao = / dF(x) = 

J A J A 



mi = / dG(x) = 



2-ko 



exp 



x' 

2o 2 



2-KCT 



exp 



s-5 ><b , (12) 



. (13) 



2a 2 



Write Aa = cti — ao- It is easy to be proved that Aa > 0. 

As mentioned above, for the correct key ko, the sample of noise 
data set {wj}j e j(k ) can ^ e modeled as the realizations of a random 
variable Yo whose distribution function is H It . while for an incorrect 
key k, sample {w,-} 3 - 6 j(m can be viewed as the realizations of a 
random variable Y\ whose distribution function is 1101 . Let po = 
P(Y > A) and pi = P(Yi > A), then 



Po = 



1 1 

dFi(x) = -a + -an , 

2 2 2 



(14) 



pi= I dFr{x) = {l- r -)a + r -ai . (15) 

J A 2 11 

And then the difference between them is that 

Ap = po -pi = -(1 - r)(ai - a ) = ^(1 - r)Aa . (16) 

When the embedding rate r being less than 1, Ap > because Aa > 
0. That implies the correct key can sample large noise data with 
lager possibility than a spurious key does. Call Ap as the "accordant 
advantage". When Ap being large enough, we can recover the correct 
key. Given the r, Ap is determined by Aa, therefore we hope to take 
the proper A to get the largest Aa. Define function 

1 f + oo ( 2 

Q(*) = 7 =j x expj-^jd,. (17) 

Then a = Q(~), ai = Q(^), therefore Aa = Q(^±) - 
Q(-). And when ^± = i.e. A = i, Aa is largest. In this 
case, 

A« = Q{~h) - Qi-h) = 1 - 20(2^) • (18) 

To compute the values of po and pi, we need also estimate the 
variation a 2 . Denote the second moment of sample W as 0,2, i.e. 

N 

^ 2 = jf 2~2 w i- Notice that W is the sample from distribution jlOt . 

i=l 

2 

therefore the result in [25] implies that 0,2 = (1 — §)(<? +0 2 ) + 
§(5 2 + l 2 ),i.e. 



0,2 



(19) 



And we take statistic i 191 as the estimation of o 2 



C. Correlation Attack 

In this section, we borrow the idea of correlation attack in 
cryptanalysis to recover the stego key with the accordant advantage 
Ap. For k £ K, the set of indices generated from the key k is denoted 
as I(k) — {ji,ji, ••■ ,Jl}- And the corresponding sample from 



noise set W obtained with k is {1 



} which can be 



viewed as a sequence of i.i.d. (independent and identically distributed) 
random variables. Define a new sequence of random variables as 



Zi = 



if Wj, > A 



1 < i < L . 



Therefore Z,'s are also i.i.d random variables. Construct a sequence 

n 

of statistics such as rj n = 2~2 Zi where 1 < n < L. For the correct 

key fco, the analysis in Sect. Ill (B) shows that P{Zi = 1} = po , 
and the Central Limit Theorem implies that the distribution of r/ n is 
approximately equal to the normal distribution 7V(npo, npo(l — po)) 
when n is large enough. Similarly, on the other hand, for an incorrect 
key k, the distribution of rj n is approximately equal to normal 
distribution N(npi, np\ (1 — Pi)) when n is large enough. Then the 
work of searching the correct key can be formulated as the following 
hypothesis testing problem: 

Flo'- Vn ~ N(npo,npo(l— po)) which means k is just the correct 
key fco; 

Hi. rj n ~ N(npi,npi(l — pi)) which means fc is an incorrect 
key. 

Select a threshold T. If r) n > T, accept Ho, otherwise accept Hi. 

Generally larger number of samples n we use, more accurate de- 
cision we can do. However, larger n means spending more searching 
time. We should determine n and the threshold T so as to achieve 
the proper probability of the false alarm event pf and that of missing 
event p m . Using <17t . we obtain that 



Pf 



T 



npi 



Vnpi(l - pi) 



Pm = Q 



npo 



T 



y/np (l-p ) 



(20) 



In the present problem, we mainly concern pf. When the number of 
all possible stego keys is \K\, pf is picked as small as so that 
the correct key can be determined uniquely. And p m could be chosen 
close to zero (for example 10~ 2 ). For given pf and p m , search the 
Table for Standard Normal Distribution Function to get Wf and w m 
such that = Q(wf) and p m = Q{w m ). Then with 1201 . we can 
compute the needed values of n and T as follows: 

n 2 

Wm\/Po(l - Po) + W/\/pi(l ~Pl) 



Ap 



T 



Wf 



\J npi(l - pi) + npi 



(21) 



(22) 



Note that to get n samples of noise data, n* (n* w — ) pixels are 
needed on average. So combining 1 1 6i and <2U . we can get an 
estimation for the number of needed pixels n* such as 

4 (w m Vpo(1 - po) + Wf \/pT(i -Pij) 

n ' w — m \7CT2 — • (23) 

r[(l — r)Aa\ 2 

Equation 1231 shows that n* —* 00 as r — * or 1. In other 
words, when the embedding rate r is very small (close to 0) or very 
large (close to 1), the process of recovering stego key will become 
difficult because we have not enough pixels to use. Notice that this is 
accordant with the information-theoretic analysis in Sect.II. And this 
conclusion will also be proved by he experimental results on "Hide 
and Seek 4.1" in next section. 

With preparations above, now we describe the attacking method. 
Assume that we have detect a stego image S with N pixels, and 
know details of the steganographic algorithm except the stego key. 
The attacking procedure goes through the following steps. 

Algorithm - Correlation Attack 

Step 1) Estimate embedded message length L and the em- 
bedding rate r (r = -jfe) using the method in [26]; 
2) Filter the stego image S and take the noise data set 
W = {wi, ,W2, ■■ ■ ,wn} as described in Sect. Ill 
(A); 
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3) Estimate the variance a 2 with statistic 1191 . Let A = 
0.5, and compute po and pi by using equations 1 1 21 . 
fT3l , GU and {B); 

4) Let pf — -^j, choose a proper p m (for example 
10 -2 ), and pick the Wf and w m such that = 
Q(wf), and p m = Q(w m )- Finally compute the 
necessary number of samples n and the threshold T 
using 1211 and 1221 . 

Step 1 If n > L , go to Step 3; otherwise, test all stego keys in 
K,: for every k 6 /C, seed the PRNG with to generate the 
set containing n sample indices I(k) = {ji, ja, ■ • ■ , jn} 
and extract n samples of noise data {utj 1 , Wj 2 , Wj n }. 
Then count the number T k of iw^'s such that Wj i > 0.5, 
i.e. T k = \{w u \w u > 0.5, 1 < i < n}\. If T k < T, 
reject k, otherwise save k to the set B, i.e. B — {k\ k £ 
K and T fe > T}. 

Step 2 If |B| = 1, then then take the only key in B as the correct 
key and stop; If \B\ = or \B\ > 1 go to Step 3; 

Step 3 Let n = L. Test all keys in K, as does in step 1 and 
obtain T k for every k £ K,. Write T max = m&x{T k }, and 

D = {k\ k e K and T k = T max }; 
Step 4 If \D\ = 1, then take the only key in D as the correct key 
and stop; If \D\ > 1, the attack fails and stop. 

IV. Extracting Attack on "Hide and Seek 4.1" 

As an example, we use our method to recover the stego key 
of "Hide and Seek 4.1" [22] which is a typical LSB replacing 
steganographic algorithm on the GIF file with 256 shades of gray or 
color (In fact the deviser of "Hide and Seek" suggest that greyscale 
is best by far). The PRNG, used in "Hide and Seek" to generate the 
embedding path, is based on the function "random ( )" of "Borland 
C++3.1", which is seeded by a seed of 16 bits and the length of 
message together. Hiding program encrypts the header information, 
which consists of the 16 bits seed, length of message and number 
of version, with IDEA cipher to produce 64 bits cipher texts and 
embeds them into the LSBs of the first 64 pixels of the GIF file. 
The key of IDEA is generated by a password consisting of not more 
than 8 characters (64 bits). Therefore the receiver, who knows the 
password, can decipher the hider information to get the seed and 
length of message, which will seed the PRNG to extract the hidden 
message. 

It is hard to recover the 64 bits key of IDEA, but we can skip the 
first 64 pixels and recover the key of PRNG with "Correlation Attack" 
directly. "Hide and Seek 4.1" uses only GIF images with 320 x 480 
pixels, so the maximum length of message is defined as 19000 bytes. 1 
And the approach of [26] can estimate the embedding rate with error 
between ±0.02, therefore mostly about 760 (19000 x 0.04) possible 
lengths need to be tested when searching for the key. In other words, 
the cardinality of the key space we search is 2 16 x 760, i.e. the length 
of virtual key is only about 26 bits (16 + log 2 760 « 25.57). 

We do the experiment on 40 GIF files with 256-greyscale for 
several kinds of embedding rates. And the correct key can be 
determined when embedding rate r satisfies 5.3% < r < 94.7%. 
However, because the image used by "Hide and Seek" is small (only 
320 x 480 pixels), for \fC\ = 2 16 x 760, the number of needed 
samples n usually is larger than L, the algorithm has to do the Step 
3. To test the estimations for n and T with <2H and I22i . we also 
do the experiment under the assumption that the length of message 
being known, which means the key is only the 16 bits of seed. In 
this case, for r such that 1.1% < r < 98.4%, we can get the correct 

'In "Hide and Seek", when used as a part of key, the unit of message's 
length is byte. 



key successfully. Plain text and cipher text are embedded respectively 
with "Hide and Seek 4.1" for the experiments and the attacking results 
are similar. These Experiments are achieved on Pentium IV machines 
running at 2.4GHz, 512MB RAM, and there is a search rate of 250- 
8400 keys per second. The search speed is greatly influenced by the 
embedding rate. 

The detailed results of experiments on lena.gif and peppers.gif, 
when key is only the 16 bits of seed, list Table Hill and Table 
I1VI respectively. In the tables, "-" means that estimated number of 
samples n is larger than the length of message L, and the attack will 
do Step 3; T ko with "*" is smaller than threshold T and \B\ is zero, 
therefore the attack also will do the Step 3. It is shown that, when r 
satisfying 10.5% < r < 52.6% (i.e. 200 < L < 9000), the necessary 
number of samples n is smaller than the length of message L, and 
the attacking procedure can stop successfully in step 2. In this case, 
there is searching speed increase of 10% — 45% than that of setting 
n = L directly, and note that the Tfc is larger than but close to the 
threshold T, which implies that the necessary number of samples n 
and the threshold T obtained with <2H and 1221 are accurate. 

And on the whole the attacking processes need more data for 
smaller or larger embedding rate r, and when r — > (or r — > 1) 
attacks will fail, which verifies the information-theoretic conclusion 
in Sect. II once more. 




Fig. 2. lena.gif Fig. 3. peppers.gif 



V. Conclusion 

In the field of steganalysis, so far there have been many literatures 
about detecting attack while there are few about extracting attack. 
But the latter also will be concerned greatly because it is a problem 
that a cryptanalyst has to face. In this paper, we make a preliminary 
analysis on this problem using information-theoretic method that is 
an analogue of Shannon's for cryptography [27]. And the results can 
give some general idea about the extracting attack no steganogrphy. 

Our basic idea is that the extracting attack is in principle a 
kind of cryptanalysis, and it should rely on both steganalysis and 
cryptanalysis. As an example, we present an effective extracting 
method no popular LSB replacing steganography of spatial images by 
using the detecting technique of steganalysis and correlation attacking 
technique of cryptanalysis together. The analysis for our extracting 
method and the experimental results on "Hide and Seek 4.1" are both 
accordant with the information-theoretic conclusion. 

Better lower bounds on unicity of stego key for stgeo-only attack 
and attacks under other conditions are interesting problems that 
we will study. And our further work will also include exploiting 
extracting approaches on other kinds of steganographic algorithms. 
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TABLE III 
Experimental results on lena.gif 



Length of 


Embedding rate 


Number of 


Threshold T 


Tfc corresponding 


Result of attack 


message L (bytes) 


r 


samples n (bytes) 




to the correct key fco 




100 


0.005 








Fail 


200 


0.011 








Succeed 


1000 


0.053 








Succeed 


2000 


0.105 


1830 


6925 


7086 


Succeed 


3000 


0.158 


2066 


7845 


8040 


Succeed 


5000 


0.263 


2699 


10319 


10466 


Succeed 


8000 


0.421 


4374 


16888 


16922 


Succeed 


9000 


0.474 


5293 


20503 


20560 


Succeed 


10000 


0.526 


6535 


24086 


25398 


Succeed 


12000 


0.632 


10805 


41954 


42265 


Succeed 


13000 


0.684 








Succeed 


18700 


0.984 








Succeed 


18800 


0.989 








Fail 



TABLE IV 
Experimental results on peppers.gif 



Length of 


Embedding rate 


Number of 


Threshold T 


Tj. Q corresponding 


Result of attack 


message L (bytes) 


r 


samples n (bytes) 




to the correct key ko 




50 


0.003 








Fail 


100 


0.005 








Succeed 


200 


0.011 








Succeed 


1000 


0.053 








Succeed 


2000 


0.105 


1301 


4874 


5008 


Succeed 


3000 


0.158 


1470 


5527 


5669 


Succeed 


5000 


0.263 


1921 


7341 


7982 


Succeed 


8000 


0.421 


3111 


11894 


11933 


Succeed 


9000 


0.474 


3764 


14368 


14493 


Succeed 


10000 


0.526 


4648 


17889 


17964 


Succeed 


12000 


0.632 


7680 


29912 


29514* 


Succeed 


13000 


0.684 








Succeed 


18700 


0.984 








Succeed 


18800 


0.989 








Fail 
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